Testing Hypotheses by Regularized Maximum Mean Discrepancy

نویسندگان

  • Somayeh Danafar
  • Paola M. V. Rancoita
  • Tobias Glasmachers
  • Kevin Whittingstall
  • Jürgen Schmidhuber
چکیده

Do two data samples come from different distributions? Recent studies of this fundamental problem focused on embedding probability distributions into sufficiently rich characteristic Reproducing Kernel Hilbert Spaces (RKHSs), to compare distributions by the distance between their embeddings. We show that Regularized Maximum Mean Discrepancy (RMMD), our novel measure for kernel-based hypothesis testing, yields substantial improvements even when sample sizes are small, and excels at hypothesis tests involving multiple comparisons with power control. We derive asymptotic distributions under the null and alternative hypotheses, and assess power control. Outstanding results are obtained on: challenging EEG data, MNIST, the Berkley Covertype, and the Flare-Solar dataset.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nonparametric Composite Hypothesis Testing in an Asymptotic Regime

We investigate the nonparametric, composite hypothesis testing problem for arbitrary unknown distributions and in the asymptotic regime with the number of hypotheses grows exponentially large. Such type of asymptotic analysis is important in many practical problems, where the number of variations that can exist within a family of distributions can be countably infinite. We introduce the notion ...

متن کامل

Fuzzy decision in testing hypotheses by fuzzy data: Two case studies

In testing hypotheses, we may confront with cases where data are recorded as non-precise (fuzzy) rather than crisp. In such situations, the classical methods of testing hypotheses are not capable and need to be generalized. In solving the problem of testing hypotheses based on fuzzy data, the fuzziness of the observed data leads to the fuzzy p-value. This paper has been focused to calculate fuz...

متن کامل

TESTING STATISTICAL HYPOTHESES UNDER FUZZY DATA AND BASED ON A NEW SIGNED DISTANCE

This paper deals with the problem of testing statisticalhypotheses when the available data are fuzzy. In this approach, wefirst obtain a fuzzy test statistic based on fuzzy data, and then,based on a new signed distance between fuzzy numbers, we introducea new decision rule to accept/reject the hypothesis of interest.The proposed approach is investigated for two cases: the casewithout nuisance p...

متن کامل

Minimax Estimation of Maximum Mean Discrepancy with Radial Kernels

Maximum Mean Discrepancy (MMD) is a distance on the space of probability measures which has found numerous applications in machine learning and nonparametric testing. This distance is based on the notion of embedding probabilities in a reproducing kernel Hilbert space. In this paper, we present the first known lower bounds for the estimation of MMD based on finite samples. Our lower bounds hold...

متن کامل

تحلیل ممیز غیرپارامتریک بهبودیافته برای دسته‌بندی تصاویر ابرطیفی با نمونه آموزشی محدود

Feature extraction performs an important role in improving hyperspectral image classification. Compared with parametric methods, nonparametric feature extraction methods have better performance when classes have no normal distribution. Besides, these methods can extract more features than what parametric feature extraction methods do. Nonparametric feature extraction methods use nonparametric s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1305.0423  شماره 

صفحات  -

تاریخ انتشار 2013